Parallelizing Irregular Applications through the YAPPA Compilation Framework
نویسندگان
چکیده
Modern High Performance Computing (HPC) clusters are composed of hundred of nodes integrating multicore processors with advanced cache hierarchies. These systems can reach several petaflops of peak performance, but are optimized for floating point intensive applications, and regular, localizable data structures. The network interconnection of these systems is optimized for bulk, synchronous transfers. On the other hand, many emerging classes of scientific applications (e.g., computer vision, machine learning, data mining) are irregular [1]. They exploit dynamic, linked data structures (e.g., graphs, unbalanced trees, unstructured grids). Such applications are inherently parallel, since the computation needed for each element of the data structures is potentially concurrent. However, such data structures are subject to unpredictable, fine-grained accesses. They have almost no locality, and present high synchronization intensity. Distributed memory systems are naturally programmed with Message Passing Interface (MPI). Moreover, Single Program, Multiple Data (SPMD) control models are usually employed: at the beginning of the application, each node is associated with a process that operates on its own chunk of data. Communication usually happens only in precise application phases. Developing irregular applications with these models on distributed systems poses complex challenges and requires significant programming efforts. Irregular applications employ datasets very difficult to partition in a balanced way, thus shared memory abstractions, like Partitioned Global Address Space (PGAS), are preferred. In this work we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework, based on the LLVM compiler, for the automatic parallelization of irregular applications on modern HPC systems. We briefly introduce an efficient parallel programming approach for these applications on distributed memory systems. We propose a set of compiler transformations for the automatic parallelization, which can reduce development and optimization effort, and a set of transformations for improving the performance of the resulting parallel code, focusing on irregular applications. We implemented these transformation in LLVM and evaluated a first prototype of the framework on a common irregular kernel (graph Breadth First Search).
منابع مشابه
A Compilation Framework for Irregular Memory Accesses on the Cell Broadband Engine
A class of scientific problems represents a physical system in the form of sparse and irregular kernels. Parallelizing scientific applications that comprise of sparse data structures on the Cell Broadband Engine (Cell BE) is a challenging problem as the memory access pattern is irregular and cannot be determined at compile time. In this paper we present a compiler framework for the Cell BE that...
متن کاملAutomatic Parallelization of Irregular and Pointer-Based Computations: Perspectives from Logic and Constraint Programming
Abs t r ac t . Irregular computations pose some of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence det...
متن کاملRun-Time Techniques for Parallelizing Sparse Matrix Problems
Sparse matrix problems are diicult to parallelize eeciently on message-passing machines, since they access data through multiple levels of indirection. Inspectorrexecutor strategies, which are typically used to parallelize such problems impose signiicant preprocessing overheads. This paper describes the runtime support required by new compilation techniques for sparse matrices and evaluates the...
متن کاملParallelizing irregular and pointer-based computations automatically: Perspectives from logic and constraint programming
Irregular computations pose some of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures, which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task...
متن کاملA general compilation algorithm to parallelize and optimize counted loops with dynamic data-dependent bounds
We study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamically computed, data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with inductively defined termination conditions: for example, the substitution of closed forms for induction variables remains applicable...
متن کامل